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Abstract 

We study the following tree search problem: in a given tree T = (V, E) a node has been marked and 
we want to identify it. In order to locate the marked node, we can use edge queries. An edge query e 
asks in which of the two connected components of T \ e the marked node lies. The worst-case scenario 
where one is interested in minimizing the maximum number of queries is well understood, and linear 
time algorithms are known for finding an optimal search strategy [Onak et al. FOCS'06, Mozes et al. 
SODA'08] . Here we study the more involved average-case analysis: A function w :V ^ "L^ is given 
which defines the likelihood for a node to be the one marked, and we want the strategy that minimizes 
the expected number of queries. Prior to this paper, very little was known about this natural question 
and the complexity of the problem had remained so far an open question. 

We close this question and prove that the above tree search problem is A/'P-complete even for the 
class of trees with diameter at most 4. This results in a complete characterization of the complexity 
of the problem with respect to the diameter size. In fact, for diameter not larger than 3 the problem 
can be shown to be polynomially solvable using a dynamic programming approach. 

In addition we prove that the problem is ATT^-complete even for the class of trees of maximum 
degree at most 16. To the best of our knowledge, the only known result in this direction is that the 
tree search problem is solvable in 0(\V\ log | V|) time for trees with degree at most 2 (paths). 

We match the above complexity results with a tight algorithmic analysis. We first show that a 
natural greedy algorithm attains a 2- approximation. Furthermore, for the bounded degree instances, 
we show that any optimal strategy (i.e., one that minimizes the expected number of queries) performs 
at most 0(A(T)(log \ V \^\og w{T))) queries in the worst case, where w(T) is the sum of the likelihoods 
of the nodes of T and A(T) is the maximum degree of T. We combine this result with a non-trivial 
exponential time algorithm to provide an FPTAS for trees with bounded degree. 



1 Introduction 



Searching is one of the fundamental problems in Computer Science and Discrete Mathematics. In his 
classical book [20j, D. Knuth discusses many variants of the searching problem, most of them dealing 
with totally ordered sets. There has been some effort to extend the available techniques for searching 
and for other fundamental problems (e.g. sorting and selection) to handle more complex structures such 
as partially ordered sets [26l [TTl [29l EHl [8]. Here, we focus on searching in structures that lay between 
totally ordered sets and the most general posets. We wish to efficiently locate a particular node in a tree. 

More formally, as input we are given a tree T — (y, E) which has a 'hidden' marked node and a 
function w \ V ^ Z+ that gives the likelihood of a node being the one marked. In order to discover 
which node of T is marked, we can perform edge queries: after querying the edge e ^ E we receive an 
answer stating in which of the two connected components of T \ e the marked node lies. To simplify our 
notation let us assume that our input tree T is rooted at a node r so that we can specify a query to an 
edge e = uv, with u being the parent of v, by referring to v. 

A search strategy is a procedure that decides the next query to be posed based on the outcome of 
the previous queries. Every search strategy for a tree T = (V, E) (or for a forest) can be represented by 
a binary search (decision) tree D such that a path from the root of to a leaf £ indicates which queries 
should be made at each step to discover that £ is the marked node. More precisely, a search tree for T 
is a triple D = {N, E\A), where N and E^ are the nodes and edges of a binary tree and the assignment 
A : N ^ V satisfies the following properties: (a) for every node v of V there is exactly one leaf £ in D 
such that A{£) = v] (b) [search property] if v is in the right (left) subtree of u in. D then A{v) is (not) in 
the subtree of T rooted at A{u). For an example we refer to Figure [!} 

Given a search tree D for T, let d{u, v) be the length (in number of edges) of the path from to in 
D. Then the cost of or alternatively the expected number of queries of D is given by 

cost{D) = ^ d{root{D),v)w{A{v)) . 

v^leaves(D) 

Therefore, our problem can be stated as follows: given a rooted tree T = {V^ E) with \V\ = n and 
a function w : V ^ Z+, the goal is to compute a minimum cost search tree for T. This is a natural 
generalization of the problem of searching an element in a sorted list with non- uniform access probabilities. 

The State of the Art. The variant of the problem in which the goal is to minimize the number of edge 
queries in the worst case, rather than minimizing the expected number of queries, has been studied in 
several recent papers [5l|29l[28]. It turns out that an optimal (worst-case) strategy can be found in linear 
time [28] . This is in great contrast with the state of the art (prior to this paper) about the average-case 
minimization we consider here. The known results amount to the 0(logn)-approximation obtained by 
Kosaraju et al. [21j, and Adler and Heeringa [2j for the much more general binary identification problem, 
and the constant factor approximation algorithm that two of the authors gave in [23j. However, the 
complexity of the average-case minimization of the tree search problem has so far remained unknown. 

Our Results. We significantly narrow the gap of knowledge in the complexity landscape of the tree 
search problem under two different points of view. We prove that this problem is AA'P-Complete even for 
the class of trees with diameter at most 4. This results in a complete characterization of the problem's 
complexity with respect to the parametrization in terms of the diameter. In fact, the problem can be 
shown to be polynomially solvable for the class of trees of diameter at most 3. We also show that the 
tree search problem under average minimization is AAP-Complete for trees of degree at most 16 (note 
that in any infinite class of trees either the diameter or the degree is non-constant). This substantially 
improves upon the state of the art, the only known result in this direction being an O(nlogn) time 
solution fT6l for the class of trees with maximum degree 2. The hardness results are obtained by 
fairly involved reductions from the Exact 3-Set Cover (X3C) with multiplicity 3 [13j. 

In addition to the complexity results, we also significantly improve the previous known results from the 
algorithmic perspective. We first show that we can attain 2-approximation by a simple greedy approach 
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that always seeks to divide the remaining tree as evenly as possible. For bounded-degree trees, we match 
the new hardness results with an FPTAS. In order to obtain the FPTAS, we first devise a non-trivial 
Dynamic Programming based algorithm that, roughly speaking, computes the best possible search tree, 
among the search trees with height at most i7, in 0{in?2^) time. Then, we show that every tree T admits 
a minimum cost search tree whose height is 0(A • (logn + log w{T)))^ where A is the maximum degree 
of T and w{T) is the total weight of the nodes in T. This bound is of independent interest because 
the height of any search tree for a complete tree of degree A is ^{\^^ logn). Furthermore, it allows us 
to execute the DP algorithm with H = c • A • (logn + log it; (T)), for a suitable constant c, obtaining a 
pseudo-polynomial time algorithm for trees with bounded degree. By scaling the weights it; in a fairly 
standard way we obtain the FPTAS. 

The worst-case scenario has also been studied for the case where a question is posed to some node u 
and the answer is either that u is the marked node or in which connected component of the forest T \ {u} 
the marked node lies [30l|2^. We remark that it is possible to adapt our techniques to prove that for the 
average-case minimization, this "node query" -variant of the tree search problem is also AAP-Complete; 
furthermore, we can provide for it a (degree independent) FPTAS . Due to the space constraints we have 
to defer these results to the full version of the paper. 

Other Related Work. Besides the above mentioned papers, the worst-case version of searching in 
trees had already been studied and solved under a different name, one decade ago, as pointed out by 
Dereniowski [10] . That is because the problem of searching a node in a tree is equivalent to the problem 
of ranking the edges of a tree (191 13 1^ • 

The problem studied here can also be seen as a particular case of the binary identification problem 
(BIP) [12j. Suppose we are given a set of elements U = {ui, . . . ,Un}, a set of tests {ti, . . . with 
U C U, a, 'hidden' marked element and a likelihood function w : U ^ R^. A test t allows to determine 
whether the marked element is in the set t or in U\t. The BIP consists of defining a strategy (decision tree) 
that minimizes the (expected) number of tests to find the marked element. Both the average-case and 
the worst-case minimization are AAP-Complete [17j, and none of them admits an o(log n)-approximation 
unless V = AfV [Ml El- For both versions, simple greedy algorithms attain 0(logn)-approximation 
[21] m [2]. When we impose some structure in the set of tests we have interesting particular cases. If 
the set of tests consists of all the subsets of U (i.e., 2^), then the strategy that minimizes the average 
cost is a Huffman tree. Let G be a DAG with vertex set U. If the set of tests is {ti, . . . where 
ti = {uj\ui ^ Uj in G}, then we have the problem of searching in a poset [271 ED [6l|. When G is a 
directed path we have the alphabetic coding problem [16]. The problem we study here corresponds to 
the particular case where G is a directed tree. 

Applications. The problem of searching in posets (and in particular in trees) has practical applications 
in file system synchronization and software testing according to f5l [28]. 

Strategies for searching in trees have also potential application to asymmetric communication proto- 
cols [11 [31 [151 [221 ISB- ^'^is scenario, a client has to send a binary string x G {0, 1}^ to the server, x 
is drawn from a probability distribution V only available to the server. The asymmetry comes from the 
client having much larger bandwidth for downloading than for uploading. In order to benefit from this 
discrepancy, both parties agree on a protocol to exchange bits until the server learns the string x, trying 
to minimize the number of bits sent by the client (though other factors, e.g., the number of rounds should 
also be taken into account). In one of the first protocols [3", "22], at each round the server sends a binary 
string y and the client replies with a or 1 depending on whether is a prefix of x or not. Based on the 
client's answer, the server updates his knowledge about x and sends another string if he has not learned 
X yet. This protocol corresponds to a strategy for searching a marked leaf in a complete binary tree of 
height t, where only the leaves have non-zero probability. In fact, the binary strings in {0, 1}^ can be 
represented by a complete binary tree of height t where every edge that connects a node to its left (right) 
child is labeled with (1). This gives a 1-1 correspondence between binary strings of length at most t 
and edges of the tree, and the message y sent by the server naturally corresponds to an edge query. 
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2 Hardness 



In this section we shall prove that the tree search problem defined above is AAP-Complete. We shall use 
a reduction from the Exact 3-Set Cover problem with multiplicity bounded by 3, i.e., each element of the 
ground set can appear in at most 3 sets. 

An instance of the 3-bounded Exact 3-Set Cover problem (X3C) is defined by: (a) a set ?7 = 
{ui, . . . ,Un}, with n — 'ik for some A: > 1; (b) a family X — of subsets of such 

that \Xi\ — 3 for each z = 1, . . . m and for each j = 1, . . . n, we have that Uj appears in at most 3 sets of 
X . Given an instance I = {U^ X) the X3C problem is to decide whether X contains a partition of i.e., 
whether there exists a family C ^ X such that \C\ — k and UxgC ^ — U. This problem is well known to 
be AAP-Complete fT3]. 

For our reduction it will be crucial to define an order among the sets of the family X . Any total order 
< on say ui < U2 < • • • < can be extended to a total order ^ on X [JU hy stipulating that: 
(a) for any X — {xi,X2,X3},y = {1/1,1/252/3} ^ ^ (with xi < X2 < X3 and yi < 1/2 < Vs,) the relation 
X holds if and only if the sequence is lexicographically smaller than ^3^2 2/1; (b) for every 

j = 1, . . . , n, the relation uj ^ X holds if and only if the sequence uj ui ui is lexicographically smaller 
than X3 X2 xi. 

Assume an order < onU has been fixed and ^ is its extension to UUX, as defined above. We denote 
by n = (tti, . . . , TT^+m) the sequence of elements ofUUX sorted in increasing order according to ^ . From 
now on, w.l.o.g., we assume that according to < and ^ , it holds that ui < • • • < Un and Xi ^ • • • ^ X^. 
For each i = 1, . . . , m, we shall denote the elements of Xi by Uii^Ui2^Uis so that un < Ui2 < Uis- 

Example 1. Let U = {a^b^c^d^e, f}, and X = {{a, 6, c}, {6, c, d}, {d, e, /}, {6, e, /}}. Then, fixing the 
standard alphabetical order among the elements of U, we have that the sets of X are ordered as fol- 
lows: Xi = {a, 6, c},X2 = {6, c, (i},X3 = {6, e, /},X4 = {d, e,/}. Then, we have H = (tti, . . . , ttio) = 
(a, 6, c, Xi, d, X2, e, /, X3, X4). 

Because of the orders we fixed and the fact that each element of U appears in at most 3 sets of A', it 
follows that that we cannot have more than three sets of X appearing consecutively in H. This will be 
important to prove the hardness for bounded degree instances. 

We shall first show a polynomial time reduction that maps any instance I = (?7, X) of 3-bounded X3C 
to an instance F = (T, w) of the tree search problem, such that T has diameter 4 but unbounded degree. 
We will then modify such reduction and show hardness for the bounded case too. 

The structure of the tree T. The root of T is denoted by r. For each i = 1, . . . , m the set Xi ^ X 

is mapped to a tree Ti of height 1, with root and leaves t^, 5^1, 5^2, ^^3. In particular, for j = 1,2,3, 
we say that Sij is associated with the element Uij. We make each a child of r. For i = 1, . . . ,m, 
we also create four leaves a^, ai2, ftis, ^^4 and make them children of the root r. We also define Xi = 
{U, Sii, Si2, Sis, an, . . . , ai^} to be the set of leaves of T associated with Xi. For the example given above, 
the corresponding tree is given in Figure [2j 

The weights of the nodes of T. Only the leaves of T will have non-zero weight, i.e., we set w{r) = 
w{ri) = • • • = w{rjn) = 0. While defining the weight of the leaves of T it will be useful to assign weight 
also to each u ^U.ln particular, our weight assignment will be such that each leaf in T which is associated 
with an element u will be assigned the same weight we assign to u. Also, when we fix the weight of u 
we shall understand that we are fixing the weight of all leaves in T associated with u. We extend the 
function w{) to sets, so the weight of a set is the total weight of its elements. Also we define the weight 
of a tree as the total weight of its nodes. 

The weights will be set in order to force any optimal search tree for (T, w) to have a well-defined 
structure. The following notions of Configuration and Realization will be useful to describe such a 
structure of an optimal search tree. In describing the search tree we shall use qi, to denote the node 
in the search tree under consideration that represents the question about the node v of the input tree 
T. Moreover, we shall in general only be concerned with the part of the search tree meant to identify 
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the nodes of T of non-zero weight. It should be clear that the search tree can be easily completed by 
appending the remaining queries at the bottom. 

Definition 1. Given leaves £i, . . . ,1^ ofT, a sequential search tree for £i, ... ,1^ is a search tree of height 
h whose left path is qi-^^ . . . ^ qi^ . This is the strategy that asks about one leaf after another until they have 
all been considered. See Figure^ (a) for an example. 

Configurations, and Realizations of H. For each z = 1, . . . , m, let Df be the search tree with root g^.. , 
with right subtree being the sequential search tree for ti, Sis, Si2, sn, and left subtree being a sequential 
search tree for (some permutation of) a^i, . . . a^4. We also refer to Df as the ^-configuration for Xi. 

Moreover, let Df be the search tree with root qt- and left subtree being a sequential search tree for 
(some permutation of) a^i, . . . ai^. We say that Df is the ^-configuration for Xi. See Figure [s] (b)-(c). 

Definition 2. Given two search trees ri,T2, the extension of Ti with T2 is the search tree obtained by 
appending the root of T2 to the leftmost leaf of Ti. The extension of Ti with T2 is a new search tree that 
^'acts^^ like Ti and in case of all NO answers continues following the strategy represented by T2. 

Definition 3. A realization (ofH) with respect to y C X is a search tree for (T^w) defined recursively 
as follows: For each i = 1, . . . , n + m, a realization of tt^+i . . . iin+m is an extension of the realization 
of TTi+i . . . 7in+m ^Hh another tree T' chosen according to the following two cases: 

Case 1. If Tii — Uj^ for some j = 1, . . . , n, then T' is a (possibly empty) sequential search tree for the 
leaves of T that are associated with uj and are not queried in the realization o/tt^+i . . . , vr^+m- 
Case 2. //tt^ = Xj, for some j = 1, . . . , m, then T' is either D^ or D^ according as Xj ^ y or not. 

We denote by D"^ the realization of H w.r.t. the empty family, i.e., 3^ = 0. Figure [i] shows some of 
the realizations for the Example 1 above. 

We are going to set the weights in such a way that every optimal solution is a realization of H w.r.t. 
some y ^ X (our Lemma [T]). Moreover, such weights will allow to discriminate between the cost of 
solutions that are realizations w.r.t. to an exact cover for the X3C instance and the cost of any other 
realization of H. Let i?* be an optimal search tree and 3^ be such that i?* is a realization of H w.r.t. 3^J^ 
In addition, for each u ^ U define Wu = J^tx^^u '^(^^)- ^^t hard to see that the difference between 
the cost of D"^ and i?* can be expressed as follows: 
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costiD'') - cost{D*) = > w{ti) - {Wu,, + Wu,, + Wu,,) -^d^Qs^M^ij) ' (1 



where d^{qs--) is the difference between the level of the node qs-^ in i?* and the level qs- - in a realization 
of n w.r.t. y \ {Xi}. To see this, imagine to turn D^ into i?* one step at a time. Each step being the 
changing of configuration from A to B for a set of leaves Xi such that Xi G 3^. Such a step implies: 
(a) moving the question qs-- exactly d^{qs--) levels down, so increasing the cost by d^{qs--)w{uij)] (b) 
because of (a) all the questions that were below the level where g^.^. is moved, are also moved down one 
level. This additional increase in cost is accounted for by the VF^^.^. 's; (c) moving one level up the question 
about ti^ so gaining cost w{ti). 

We will define the weight of ti in order to: compensate the increase in cost (a)-(b) due to the relocation 
of qs--] and to provide some additional gain only when 3^ is an exact cover. In general, the value of d^{qs-j) 
depends on the structure of the realization for 3^ \ {Xi}; in particular, on the length of the sequential 
search trees for the leaves associated to i/^'s, that appear in 11 between Xi and Uij. However, when 3^ is 
an exact cover, each such sequential search tree has length one. A moment's reflection shows that in this 
case d^(qs-j) = 7(^, j), where, for each z = 1, . . . , m and j = 1, 2, 3, we define 

lihj) = i - 5 + \{u^ : Uij -<u^ -< Xi}\ + 5 • \{X^ : Uij ^ X^ ^ XJ| 



""^For sake of definiteness we set TTm+n+i — and the realization of TTn+m+i w.r.t. y to be the empty tree. 
^The existence of such a y will be guaranteed by Lemma [l] 
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To see this, assume that 3^ is an exact cover. Let D' be the reahzation for y\Xi^ and I be the level 
of the root of the ^-configuration for Xi in D' . The node qg-^ is at level ^+ (5 — j) in D' . In the root 
of the ^-configuration for Xi is also at level I. Also, in between level I and the level of qg- , there are 
only nodes associated with elements of some s.t. Uij ^ tt^ ^ Xj. Precisely, there is 1 level per each u,^ 
s.t. Uij ^ u,^ ^ Xi (corresponding to the sequential search tree for the only leaf associated with u,^); and 
5 levels per each s.t. Uij ^ ^ Xi (corresponding to the left path of the A or ^-configuration for 
X^). In total, the difference between the levels of qg-j in and i?* is exactly ^{i^j). 

Note that is still well defined even if there is not an exact cover y C X. This quantity will be 

used to define w{ti). 

We are now ready to provide the precise definition of the weight function w. We start with w{ui) — 1. 
Then, we fix the remaining weights inductively, using the sequence 11 in the following way: let z > 1 and 
assume that for each i' < i the weights of all leaves associated with tt^/ have been fixecQ We now proceed 
according to the following two cases: 

Case 1. TTi = Uj, for some j G {1, . . . Then, we set w{uj) = 1 + 6max{|Tpit;(i/j_i), VF^^^.}, where |r| 
denotes the number of nodes of T. 

Case 2. tt^ = Xj, for some j G {!,... ^rn}. Note that in this case the weights of the leaves Sji^Sj2^Sjs 
have already been fixed, respectively to w{uji),w{uj2), and w{ujs)- This is because we fix the weights 
following the sequence 11 and we have uji ^ Uj2 ^ ujs ^ Xj. In order to define the weights of the 
remaining elements in Xj we set w{aji) = • • • = w{aj4) = Wu^^ + Wuj2 + ^Uj3 + Y1k=i tO'^ i^)w{uj,^). 
Finally, we set w{tj) = w{aji) + w{Xj)/2. 

Remark 1. For each i = 1, . . . , n + m, let w{7Vi) denote the total weight of the leaves associated with TVi. 
It is not hard to see that w{Tii) — 0(|Tp^). Therefore we have that the maximum weight is not larger than 
w{lTm+n) = 0(|rp(^+^)). It follows that we can encode all the weights using 0(3|r|(n + m) log |T|) hits, 
hence the size of the instance (T, w) is polynomial in the size of the X3C instance I = (?7, X). 

Since t^ is the heaviest leaf, one can show that in an optimal search tree i?* the root can only be qt^ 
or q^^. For otherwise moving one of these questions closer to the root of i?* results in a tree with smaller 
cost, violating the optimality of I?*. Moreover, by a similar "exchange" argument it follows that if qr^ 
is the root of D"" then the right subtree must coincide with a sequential search tree for tm^ 5^2, <Sm3 
and the left subtree of q^^ must be a sequential tree for a^i, . . . ,a^4. Therefore the top levels of i?* 
coincide either with or with or equivalently they are a realization of Urn+n- Repeating the same 
argument on the remaining part of i?* we have the following (the complete proof is in appendix): 

Lemma 1. Any optimal search tree for the instance {T,w) is a realization of II w.r.t. some y ^ X. 

Recall now the definition of the search tree D^. Let i?* be an optimal search tree for (T^w). Let 
3^ C A' be such that is a realization of 11 w.r.t. 3^. Equation ([T]) and the definition of w{ti) yield 



cost{D^)-costiD*) = E (^ + E(7(^,J)-4fe..))M«..)) = E E + ^(^, jK^i)) 



(2) 

where r(z, j) = 7(i, — d^{qs-^)^ and n G {1,2,3} is such that = Uj. 

By definition, if for each j — 1, . . . ,n, there exists exactly one X^ G 3^ such that Uj G X^, then we 
have r(z, j) = 0. Therefore, equation (2) evaluates exactly to Yl^=i ^^2^^ • Conversely, we can prove that 
this never happens when for some 1 < j < n, Uj appears in none or in more than one of the sets in 3^. 
For this we use the exponential (in |T|) growth of the weights w{uj) and the fact that in such case the 



^By the leaves associated with tt^/ we mean the leaves in Xj, if TVi = Xj for some Xj G Af, or the leaves associated with 
u if TTi' — u for some u E U. 
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inner sum of the last expression in ([2]) is non-positive. In conclusion we have the following result, whose 
complete proof is in appendix. 

Lemma 2. Let i?* be an optimal search tree for {T,w). Let y C X be such that D"" is a realization of II 
w.r.t. y. We have that cost{D*) < cost(D^) — ^ J2ueu ^(^) ^'^^V if y is cl solution for the X3C 

instance I = (?7, X). 

The AAP- Completeness of 3-bounded X3C [13], Remark [l| and Lemma [2] imply the following. 

Theorem 1. The search tree problem is J\fV -Complete in the class of trees of diameter at most 4. 

Note that this result is tight. In fact, for trees of diameter at most 3 the problem is polynomially 
solvable, e.g., via dynamic programming (see Appendix). 

AAP- Completeness for bounded-degree instances. We can adapt our proof to show that the search 
tree problem is AAP-Complete also for bounded-degree trees. For that, we modify the input tree as 
follows. We partition the subsets of X so that sets that are adjacent in 11 are put together. For the 
instance in the Example 1 the corresponding partition would be {{Xi}, {X2}, {X3, X4}}. 

Let Z = {Zi, . . . , Zrp] be the partition obtained from the input instance (?7, X). Recall the definitions 
of the subtrees Tj and the leaves aji, . . . , aj4 (j = 1, . . . , m) given for the construction of the tree T. We 
now create a new tree as follows. For each z = 1, . . . in there is a subtree Hi that corresponds 
to the element Zi G Z. Hi has root hi. For each j such that Xj G Zi we make the root of Tj, i.e., rj, and 
the leaves aji, . . . , aj^ children of hi. Finally, we create nodes zi, . . . , and make hi a child of zi and for 
z = 2, . . . ,p we make Zi-i and hi children of Zi. See Fig. [5] for the tree corresponding to the instance 
in Example 1. 

The fact that in 11 there are no more than three elements of X which appear consecutively, implies 
that any Zi contains at most three elements. This gives that the maximum degree in is at most 16. 

Regarding the weight function, we extend to the weight function defined for the tree T by setting 
w{hi) — w{zi) = 0, for each z = 1, . . . , and leaving the other weights as before. 

It turns out that Lemma [l] still holds for the new instance {T^,w). In fact, in each subtree Hi the 
structure of the instance is exactly the same as in the tree T, so one can prove that any optimal solution 
for such subinstance is a realization of the corresponding subsequence of 11. Moreover, because of the way 
we partitioned X, and the weight function w, it follows that the smallest weight of an ajk in Zi is bigger 
than the total weight of the leaves in . . . , Zi-i. This is enough to enforce the order of a realization of 
n, i.e., that the leaves tj, aji, . . . , aj4 are queried before the leaves in Zi, . . . , Zi^i. We have proved the 
following (a formal proof is in the appendix). 

Lemma 3. Any optimal search tree for the instance (T^^w) is a realization ofH w.r.t. some y ^ X. 

By using this lemma together with Lemma [2] we have that Theorem [l] holds also for bounded-degree 
instances of the tree search problem. 

3 Approximation Algorithms 

We need to introduce some notation. For any forest F of rooted trees and node j G F, we denote by 
Fj the subtree of F composed by j and all of its descendants. We denote the root of a tree T by r(T), 
6{u) denotes the number of children of u and Ci{u) is used to denote the ith child of u according to some 
arbitrarily fixed order. The following operation will be useful for modifying search trees: Given a search 
tree D and a node u ^ 3. left deletion of u is the operation that transforms D into a new search tree 
by removing both u and its left subtree from D and, then, by connecting the right subtree of u to the 
parent of u (if it exists). A right deletion is analogously defined. 

Given a search tree D for T, we use lu to denote the leaf of D assigned to node u of T. 
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3.1 The natural greedy algorithm attains 2-approximation 

Consider a search tree D for T. Notice that when we follow a path from the root of D to one of its 
leaves, we reduce the search space (eliminate part of T) whenever we visit a new node. Therefore, we can 
associate with each node of D the subtree of T which may still contain the node we search for. Notice 
that the tree T' associated with node v ^ D is exactly the one induced by the nodes of T that correspond 
to the leaves of D^^ hence w{T') — w{Dy). E.g., in Fig. [l]the node < f > in D is associated with T^. 

We can transform a search tree D for T into a search tree for an arbitrary subtree of T. This 
search tree is computed by taking each node v ^ D assigned to a node A(v) in T — and applying a 
left deletion if A{v) is an ancestor of r(T') or a right deletion otherwise. The important property of this 
construction is that the path r{D^) ^ Ix, for every x G T\ is exactly the subpath obtained by removing 
all queries to nodes in T — from r{D) ^ Ix- The next lemma formalizes this discussion: 

Lemma 4. Consider a tree T and a search tree D for it. Let be a subtree of T. Then there is a 
search tree D' for T' such that d{r{D')^lx) — d{r{D)^lx) — Ux, where Ux is the number of nodes in the 
path r(D) ^ Ix assigned to nodes inT — T' . 

We show that the natural greedy algorithm guarantees an approximation factor of 2. The algorithm 
can be formulated in two sentences. (1) Let x be a node such that \w{Tx) — w{T\Tx)\ is minimized. Set 
A{r{D)) = X. (2) Construct the right and left subtree of D by recursively applying the algorithm to Tx 
and T\Tx, respectively. 

In order to prove that this algorithm results in a 2-approximation, we show that any search tree i?* 
can be turned into the greedy search tree D while the cost increases by at most cost{D''). 

The proof is by induction on the number of nodes n of the input tree T. For the basic case n = 1 
there is nothing to show. Assume that the claim holds for any tree with at most n — 1 nodes. In order 
to prove it true for T we proceed in two steps. 

Let X be the node queried at the root of D. Also let Dq (resp. Dq) and (resp. Di) be the search 
tree for Tx and T \Tx obtained from L)* (resp. D) via Lemma [4j (a) Construct a search tree with 
A{r{D')) = X and the left and right subtree being and Dq respectively. It is not hard to see that is 
a legal search tree, (b) Use the induction hypothesis for turning Dq and D^ into Dq and Di respectively. 
It is straightforward to see that the transformation results in the tree D. 

Lemma 5. We have cost{D') < cost{D'') + w{T)/2. 

Proof sketch. Let x and x* be the nodes queried at the root of D' and I?*, respectively. W.l.o.g. we 
assume x 7^ x*, as otherwise the lemma trivially holds. We can also assume that x* is a node from T^, 
because the opposite case is analyzed analogously. 

We shah first analyze the case w{Tx) < w{T - Tx), i.e., w{Tx) < w{T)/2. As any path from r(i?*) to 
a leaf in contains r{D^) and T — Tx does not contain x*. Lemma [i] states that the depth of any leaf 
in DI is at least by one smaller than it is in The lemma also implies that the depth of any leaf in 
Dq is not greater than it is in L)*. So we have 

cost{D') = w{T) + cost{Dl) + cost{Dl) < w{T) + cost{D'') - w{T - Tx) < cost{D'') + w{T)/2. 

The case w{Tx) > w(T — Tx) requires a more involved analysis and we defer it to the appendix due to 
the space limitations. □ 

It follows that the cost of D can be bounded from above by 

cost{D) = w{T)^cost{DQ) + cost{Di) < w(T) + 2cost{D^) + 2cost{Dl) = 2cost{D') -w{T) < 2cost{D''). 

The first inequality follows from the induction hypothesis and the second one is due to Lemma [5j 
We have proven the following result. 

Theorem 2. The greedy strategy is a polynomial 2-approximation algorithm for the tree search problem. 
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3.2 An FPTAS for Searching in Bounded-Degree Trees 

We now present an FPTAS for searching in trees with bounded degree. First, we devise a dynamic 
programming algorithm whose running time is exponential in the height of optimal search trees. Then 
we essentially argue that the height of optimal search trees is 0(A(r)-(logit;(T)+logn)), thus the previous 
algorithm has a pseudo-polynomial running time. Finally, we employ a standard scaling technique to 
obtain an FPTAS. 

We often construct a search trees starting with its 'left part'. In order to formally describe such 
constructions, we define a left path as an ordered path where every node has only a left child. In 
addition, the left path of an ordered tree T is defined as the ordered path we obtain when we traverse T 
by only going to the left child, until we reach a node which does not have a left child. 

A dynamic programming algorithm. In order to find an optimal search tree in an efficient way, 
we need to define a family of auxiliary problems denoted by V^{F,P). In the following paragraphs 
we describe the essential structures needed in these subproblems and then we show how to use the 
subproblems to find an optimal search tree. 

First we introduce the concept of an extended search tree, which is basically a search tree with some 
extra nodes that have not been associated with a query yet (unassigned nodes) and some other nodes 
that cannot be associated with a query (blocked nodes). 

Definition 4. An extended search tree (EST) for a forest F = {V,E) is a triple D = {N,E',A), 
where N and E^ are the nodes and edges of an ordered binary tree and the assignment A : N ^ V U 
{blocked, unassigned} simultaneously satisfy the following properties: 

(a) For every node v of F, D contains both a leaf i and an internal node u such that A[t) — A{u) — v; 

(b) yu,v G with A(u),A{v) G F, the following holds: If v is in the right subtree of u then A{v) G 
^A(u)' If ^ ^^/^ subtree of u then A{v) ^ i^A(w)/ 

(c) If u is a node in D with A{u) G {blocked, unassigned} , then u does not have a right child. 

If we drop (c) and also the requirement regarding internal nodes in (a) we have the definition of a 
search tree for F. The cost of an EST D for F is analogous to the cost of a search tree and is given by 
cost{D) — ^ d{r{D),u)w{A{u)), where the summation is taken over all leaves u ^ D ioi which A{u) G F. 

At this point we establish a correspondence between optimal EST's and optimal search trees. Given 
an EST D for a tree T, we can apply a left deletion to the internal node of D assigned to r(T) and right 
deletions to all nodes of D that are blocked or unassigned, getting a search tree D' of cost cost{D') < 
cost{D) — w{r{T)). Conversely, we can add a node assigned to r(T) to a search tree D' and get an EST 
D such that cost(D) < cost{D') + w{r{T)). Employing these observations we can prove the following 
lemma: 

Lemma 6. Any optimal EST for a tree T can be converted into an optimal search tree for T (in linear 
time). In addition, the existence of an optimal search tree of height h implies the existence of an optimal 
EST of height h + I. 

So we can focus on obtaining optimal EST's. First, we introduce concepts which serve as a building 
blocks for EST's. A partial left path (PLP) is a left path where every node is assigned (via a function 
A) to either blocked or unassigned. Now consider an EST D and let L — {/i, . . . , /|^|} be its left path. 
We say that D is compatible with a PLP P — {pi, . . . ^p\p\} if \P\ — \L\ and A{pi) — blocked implies 
A{li) — blocked. The tree in Figure [7|(c) is compatible with the path of Figure [7|(b). 

This definition of compatibility implies a natural one to one correspondence between nodes of L and 
P. Therefore, without ambiguity, we can use pi when referring to node U and vice versa. 

Now we can introduce our subproblem . First, fix a tree T with n nodes and a weight function w. 
Given a forest F — {Tf,^(^y^^,Tf.^(^y^^ . . . , Tc^(^)}, a PLP P and an integer B, the problem V^{F, P) consists 
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of finding an EST for F witli minimum cost among those EST's for F that are compatible with P and 
have height at most B. We shall note that F is not a general subforest of T, but one consisting of subtrees 
rooted at the first / children of some node G T, for some 1 < / < 5{u). 

Notice that if P is a PLP where all nodes are unassigned and P and B are sufficiently large, then 
V^{T, P) gives an optimal EST for T. 

Algorithm for V^{F,P). We have a base case and also two other cases depending on the structure 
of F. In all these cases, although not explicitly stated, if P does not contain unassigned nodes then 
the algorithm returns 'not feasible'. If during its execution the algorithm encounters a 'not feasible' 
subproblem it ignores this choice in the enumeration. 

Base case: F has only one node u. In this case, the optimal solution for V^{F,P) is obtained from P 
by assigning its first unassigned node, say pi^ to u and then adding a leaf assigned to as a right child 
of Pi. Its cost is i • w(u). 

Case 1: F is a forest {T^^^^), . . . ,rc^(^)}. The idea of the algorithm is to decompose the problem into 
subproblems for the forests rc/(w) F \ T^^i^u)- For that, it needs to select which nodes of P will be 
assigned to each of these forests. 

The algorithm considers all possible bipartitions of the unassigned nodes of P and for each bipartition 
U = {U^ ^ U^) it computes an EST for F compatible with P. At the end, the algorithm returns the 
tree with smallest cost. The EST is constructed as follows: 

1. Let P^ be the PLP constructed by starting with P and then setting all the nodes in as blocked 
(Figure [6jb). Similarly, let P^ be the PLP constructed by starting with P and setting all nodes 
in as blocked. Let and D"" be optimal solutions for (T^^^^), P^) and V^{F \ rc^(^), P^), 
respectively (Figure [ojc). 

2. The EST is computed by taking the 'union' of and (Figure [6]d). More formally, the 
'union' operation consists of starting with the path P and then replacing: (i) every node in P H ^7-^ 
by the corresponding node in the left path of and its right subtree; (ii) every node in P DU^ 
by the corresponding node in the left path of and its right subtree. 

Notice that the height of every EST is at most P; this implies that the algorithm returns a feasible 
solution for P^(P, P). Also, the cost of is given by OPT{V^ {T^^^^), pf)) + OPT{V^ {F\T^^^^), P"")). 

The optimality of the above procedure relies on the fact we can build an EST for T^j^i^u) by starting 
from an optimal solution P* for P^(P, P) and performing the following operation at each node v of its 
left path: (i) if v is unassigned we assign it as blocked; (ii) if v is assigned to a node in P\Tc^(-^) we assign 
it as blocked and remove its right subtree. We can construct an EST for P\Tc^(^) analogously. Notice 
that cost(D^) + cost{D^) — cost{D^). The proof is then completed by noticing that, for a particular 
choice of ZY, and are feasible for P^(Tc^(^), P-^) and V^{F \ Tc^(^), P^), so the solution returned 
by the above algorithm costs at most OPT{V^{T^^^u)^ P^)) + OPT{V^{F \ Tc^(^), P^)) < cost{D''). 

Case 2: F is a tree Ty. Let pi be an unassigned node of P and let t be an integer in the interval [i + 1, B]. 
The algorithm considers all possibilities for pi and t and computes an EST P^'^ for Ty of smallest cost 
satisfying the following: (i) P^'^ is compatible with P; (ii) its height is at most P; (iii) the node of the 
left path of P^'^ corresponding to pi is assigned to v; (iv) the leaf of P^'^ assigned to v is located at level 
t. The algorithm then returns the tree P^'^ with minimum cost. 

In order to compute P^'^ the algorithm executes the following steps: 

1. Let P^ be the subpath of P that starts at the first node of P and ends at pi. Let P^'^ be a left 
path obtained by appending t — i unassigned nodes to P^ and assigning pi as blocked (Figure [7|b). 
Compute an optimal solution D' for P^({Tc^(^), Tc2(^), . . . ,Tc^^^^(^)}, P^'^). 
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2. Let be the node of D' corresponding to pi and let y' be the last node of the left path of D' (Figure 
[7|c). The tree D^'^ is constructed by modifying D' as follows (Figure [7|d): make the left subtree of 
p[ becomes its right subtree; assign p[ to v] add a leaf assigned to v as the left child of y'] finally, 
as a technical detail, add some blocked nodes to extend the left path of this structure until the left 
path has the same size of P. 

It follows from properties (i) and (ii) of the trees i^^'^'s that the above procedure returns a feasible 
solution for V^{Ty^ P). The proof of the optimality of this solution uses the same type of arguments as 
in Case 1 and is deferred to the appendix. 

Computational complexity. Notice that it suffices to consider problems P^(F, P)'s where \P\ < S, 
since all others are infeasible. We claim that, by employing a Dynamic Programming strategy, we can 
compute all these problems in 0{in?2^^) time. First, there are 0{n2^) such problems; this follows from 
the fact that for each node u inT there are two possible forests F considered in subproblems (i^ — 
or F = {rc^(^/), Tc2(^i/), . . . ,Tc^(^/) = Tu}^ where u is the /-th child of u') and the fact there are 0(2^) 
PLP's of size at most B. It is not difficult to see that each of these problems can be solved in 0(n + 2^) 
time, so the claim holds. 

An upper bound on the height of optimal search trees. We now argue that there is an optimal 
search tree for {T^w) whose height is 0(A(T) • (log it; (T) + logn)). 

The following lemma is the core of our 'geometric decrease' argument. It essentially states that we 
can cut a constant factor of the total weight of an optimal search tree by going down a number of levels 
that only depends on the maximum degree of T. 

Lemma 7. Consider an instance (T, w) for our search problem and let i?* he an optimal search tree for 
it. Fix < a < 1 and an integer c > 3(A(T) + l)/a. Then, for every node i;* G i?* with d(r{D'')^ i;*) > c 
we have that w(D^*) < a • w(D''). 

Proof sketch. (The full proof is deferred to the appendix.) By means of contradiction assume the lemma 
does not hold for some i;* satisfying its conditions. Let T be the tree associated with i;*, rooted at node r. 
Since by hypothesis T contains a large portion of the total weight (greater than a • it;(I?*)), we create the 
following search tree which makes sure parts of T are queried closer to r{D')\ the root of D' is assigned 
to r; the left tree of r{D') is a search tree for T — T^ obtained via Lemma [4j in the right tree of r{D') we 
build a left path containing nodes corresponding to queries for ci(r),C2(r), . . . ,Q(^)(r), each having as 
right subtree a search tree for the corresponding Tf..(^f^^ obtained via Lemma [4j If s is the number of nodes 
of T—Tr queried in r{D'^) ^ i;*, then Lemma [4] implies that saves at least 5 — (A(r) + 1) queries for each 
node in T when compared to I?*; this gives the expression cost{D') < cost{D'^) — s-w{T)-\-{A{T) + l)w{T). 
Using the hypothesis on c and w{T), this is enough to reach the contradiction cost{D') < cost{D'') when 
5 > c/3. The case when 5 < c/3 is a little more involved but uses a similar construction, only now the 
role of r is taken by a node inside in order to obtain a more 'balanced' search tree. □ 



Assume that the weight function w is strictly positive (see Appendix E.3| for the general case). Since w 
is integral, employing Lemma [t] repeatedly shows that i?* has height at most 0(A(T) • (logit;(T) +logn)). 

Prom the DP algorithm to an PPTAS. By Lemmas [6] and [7| we can obtain an optimal search tree 
for {T,w) by finding an optimal EST of height B = 0(A(T) • {\ogw{T) + logn)) (via V^) and then 
converting it into an optimal search tree. Since we can employ the algorithm presented in the previous 
section to achieve this in O ((n • w{T))^^^^^^^^ time, we obtain a pseudo-polynomial time algorithm for 
trees with bounded degree. Furthermore, such an algorithm can be transformed into an FPTAS by scaling 
and rounding the weights just as in the well-known FPTAS for the knapsack problem [18j (see the 
appendix for details): 

Theorem 3. Consider an instance (T^w) to our search problem where A(T) = 0{1). Then there is a 
poly{n • w(T))-time algorithm for computing an optimal search tree for (T^w). In addition, there is a 
poly {n/e) -time algorithm for computing an (1 + e) -approximate search tree for (T^w). 
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Appendix 



A The proof of lemma [T] 

We need two inequalities regarding the weights. 
Fact 1 For each 1 < < z < m it holds that 

w{ti) > w{aii) > w{ti>) + w{ui> i) + w{ui> 2) + w{ui> 3) (3) 

Proof of the fact. The first inequality follows by definition. In order to prove the second inequality let us 
consider the difference 

Diff = w{aii) - {w{ti') + w{ui' 1) + w{ui' 2) + w{ui' 3)) . 

By definition we have 

3 3 

Diff = (^-^. + 7(^, J>K-)) - E (^-.^. + (^(^''^') + ^/^)^i^^'3)) ■ 

Case 1. Uis = Ui^^. Note that 7(i,3) > 5 + 7(2', 3), Since Wu.j.Wu.,. > and < -f{ij),-f{i' J) < \T\ 
we get that 

Diff > 5w(u,s) - 3Wu,^ - (2|T| + 3)w{u,^2)- 
Let be such that Uis = u,^. It follows from the definition of the function w() that 

w{u^s) = w{u^) = 1 + 6max{lK^, |r|^it;(ix^_i)} > 3Ty^^,3 + (2|T| + 3)it;(ix,/2). 

Thus, Diff > 0. 

Case 1/^/3 ^ ix^3. Then, it must also hold that X^/ ^ 1/^3. Therefore we have 

w{ai) > Wu,s > ^(^iO > ^(^iO + ^(^i^) + 'w(Uif2) + '^^(^^2^3)- 
Fact S For each 1 < i < m and = 1, . . . , 4, it holds that 

w(ai,^) > 3(w{uis) + w(ui2) + + Wu,s (4) 

It follows directly from the definition of w{ai,^) and the the fact that j{i,j) > 3 ( j = 1, 2, 3). 

Proof of Lemma [TJ Let D be an optimal search tree for (T, w). 

Let ^ be the deepest node in the left path of D such that D — D^ is the realization of tt^+i . . . Tr^+m 
for some z = 0, . . . , n + m. In particular, we take z = n + m if ^ is the root of D^ i.e., no upper part of D 
looks like a realization of suffix of 11. 

By contradiction, assume that D is not a realization of 11, in particular z > 0. We shall prove that 
by modifying D^ in such a way that its top part becomes a realization of tt^ we obtain a new search tree 
with cost smaller than the cost of D. The desired result will follow by contradiction. We consider the 
following cases: 

Case 1. TTi = Xj, for some j = 1, 2, . . . , m. First we argue that i G {qtj^ Qrj}- Let qjy (for some z/ G T) be 
the parent of qrj. If G Tj we swap qtj with qiy otherwise we swap qr^ with g^^j^Let D^ be the new tree 
so obtained. 

^When swapping we imply that the two nodes are exchanging position and they are carrying along also their right 
subtrees. This is possible because qrj is the left child of q^. 
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If z/ is a leaf in T, then we have cost{D') < cost{D) — w{tj) + w{u) < cost{D) since tj is the leaf of 
largest weight in D£. Otherwise, it must be that u = r^/ for some / < j. In this case, by Q, we have 
cost{D^) < cost(D) — w{tj) + w(tjf) + w{ujfs) + w{uj'2) + w{uj'i) < cost(D). In either case we obtain a 
tree of average weight smaller than violating the optimality of D. 

Alternatively, if qtj is not the right child of g^^. , then we swap qtj with its parent. Note that qtj must 
be the left child of its parent. By proceeding as above, we can prove that the resulting tree has cost 
smaller than D, again a violation to the optimality of D. Therefore, it must be ^ G {qtj^Qrj}- We now 
split the analysis according to this two possible cases. 

Subcase 1.1. I — qj... Then, because of the assumption on D — Di and the search property, it follows that 
the right subtree of qr^ contains the nodes qtj^Qsj3^Qsj2^Qsji' Also, it is not hard to see that they must 
appear in this order, for otherwise by reordering them we would decrease the average cost of D, since 
w{tj) > w{sjs) > w{sj2) > w(sji). Therefore the right subtree of i coincides with the right subtree of 
Df. 

Suppose now w.l.o.g. that for each = 2, 3, 4, it holds that qaj^-i is closer to the root of D than qa^^ 
For the sake of contradiction, assume that qa^^ is not a child of qr-. Let qy be the parent of qa^^. Note 
that qaj^ can only be the left child of q^^. By swapping qa^^ with qi, the resulting tree has smaller expected 
cost than again in contradiction with the assumed optimality oi D. In fact, if z/ is a leaf in T then it 
follows from inequality Q that w{aji) > w{ujs) > w{v). Otherwise, \i v — rj/ for some / < j, and then, 
by ([3]) we have that w{aji) is greater than the weight of the right subtree of g^y. The same arguments 
show that qaj^ is the left child of ^a^^_i, for each Hi = 2,3,4. 

We can conclude that in the left path of the nodes following £ are exactly qaji^ • • • 5 Qaj^- 

Let f be 

the left child of g^^.^. We have showed that in this subcase — D^f coincides with Dj^. 

Subcase 1.2. I — qty There is nothing to prove about the right subtree of I. In order to prove that in the 
left path of the node £ is followed by g^^.^, . . . , ^a^^J^ we proceed as before. Assume (by contradiction) 
that qaj^ is not a child of qr-. Let qi, be the parent of qa^^. Note that qa^^ can only be the left child of q^^. 
We swap qaj^ with q^. Let be the resulting search tree. If v is Vj or a leaf in Tj \ {tj}, we have that 
cost{D') — cost{D) — w{aji) + uj{Xj) < 0, where w{Xj) accounts for the weight of the right subtree of 
qjy and the last inequality follows by Q. On the other hand, if u is either a leaf in T or is equal to rjf for 
some / < j, then we can apply the same argument as in Subcase 1.1, to reach the same conclusion, i.e., 
we violate the optimality of D. 

Therefore, we conclude that qa^^ is the left child of q^. Repeating the same argument we can also show 
that qaj^ is the left child of qaj^^i, for each n = 2,3,4. Let be the left child of g^^.^. We have showed 
that in this subcase, — D^f coincides with . 

We can conclude that in both subcases of Case 1, the tree D — D^r is realization of tt^, . . . ^iin+m 
against the assumption that I is the deepest node for which such a condition holds. 

Case 2. TVi = uj^ for some j = 1, 2, . . . , n. 

Let us consider the set of leaves L of which are associated with uj and such that they are not 
queried in D — D^. Since D — is di realization of tt^+i . . . vr^+m? the leaves of which are not in L 
and are queried in are either in [Jx^uj ^ associated to ujf for some / < j. For the sake of 

contradiction we assume that one of the first \L\ nodes in the left path of does not correspond to a 
leaf in L. 

Let us construct a tree from as follows: first we construct an auxiliary tree by removing from 
Di all the nodes corresponding to the leaves in L. Then, we add a left path with these nodes to the 
top of this auxiliary tree. Our assumption that one of the first \L\ nodes in the left path of does not 
correspond to a leaf in L implies that 

^We are again assuming, w.l.o.g., that for each k = 2,3,4, it holds that Qaj^^i is closer to the root of D than Qaj^- 
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cost{D') < cost{Di) - w{uj) + \L\ ^ w{X) + 3 • |L| • ^ w{u) 

The negative term in the equation above is because the sum of the levels of the nodes associated with 
Uj in Di is at least 7 while this sum is exactly Q in D\ The other terms are due to the fact that the level 
of a node can increase by at most \L\ units in our construction. The definitions of Wuj and w{uj) imply 
that 

cost{D') < cost{Di) - w{uj) + \L\Wu^ + \L\ • \T\ • w{uj-i) 

Since \L\ < 3 and w{uj) > 6max{VF^i^. , \T\^w(uj-i)} we get that cost{D') < cost{Di). This implies, 
however, that D can be improved, a contradiction. 

Thus, the D^s \L\ top levels coincide with a sequential search tree for L. Let I' the left most query 
of such sequential search. Therefore, D — D^i is realization of tt^ . . . , tt^+i, which contradicts also in this 
Case 2 the hypothesis that I is the deepest node for which such a condition holds. 

The proof is complete. □ 



B The proof of Lemma [2] 

Lemma 2. Let he an optimal binary search tree for (T, w). Let y C X be such that V is a realization 
ofU w.r.t. y. We have that cost^D"") < cost{D^) — | ^ueu ^(^) ^^^^ if y is a solution for the 

X3C instance I ^ {U,X). 

Proof. We start proving the only z/part. Assume that cost{D'^) — cost{D'^) > ^J^ueu^i'^)- shall 
use induction on j to prove that for each j = n, . . . , 1 there exists exactly one X G 3^, such that uj G X. 

Fix j* < n and assume that for every j > j* it holds that there exists exactly one X ^ y such that 
Uj G X. 

Suppose that there is no i G {1, . . . , m} such that ^x* G G 3^. We can rewrite Q as follows: 

cost{D^)-cost{D*)^J2 E (^ + r(i,jMn,)y 

i=i x^ey ^ ^ 

Uj^Xi 

where r(z, j) = "){i^i^) — d^{qs-^)^ and n G {1,2,3} such that = Uj. 

Now, since we are assuming that for all j > j* there exists only one i such that Xi ^ y and uj G Xi, 
by the definition of d^{-) and 7(2, a^), we have r(z, j) = 0. So we obtain 

costiD^) - costm = E ^ + E E + r(z, j>(.,)) , 

j>j* j<j* Xi^y ^ ^ 

Uj ^Xi 

where we also used the assumption that no X G 3^ contains Uj* and therefore Uj* does not contribute to 
the sum. 

Now we can observe that, for each j < j*, there are at most 3 set in X containing Uj. Moreover, r(z, j) 
being a difference of levels in L)* can be bounded by |T|. Also w{uj) < w{uj*) /6\T\^ , for each j < j*. 
Therefore, we have the desired contradiction: 

cost{D^) - cost{D*) < J2 ^ + 3(i* - + 1/2)-^ < E ^ + ^ 2 E «'(^)- 

j>i* j>j* u&u 
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Suppose now that there are k > 1 subsets in y that contain uj*. Rewriting Q as before, we obtain: 



Let us observe that among the Hi sets Xi ^ y such that uj* G Xi only one contributes with a positive 
weight w(uj*)/2 since r(z, j*) = 0. For the others, we have a negative contribution of at least w{uj*)/2^ 
since r(z, j*) becomes negative. Moreover, for the j < j* we can repeat the argument we used in the 
previous case. Therefore we obtain the desired contradiction: 



cost{DA)-cost{D*) < J2^-^wiuj.) + 3{f-l)i\T\ + l/2)-^ < ^ < -Y.w{u). 

j>j* 3>j* ueu 

This concludes the inductive argument and the proof of the only z/part. 

In order to prove the if part of the statement we notice that if 3^ is a solution for I then for each 
j = 1, . . . ,n there exists exactly one index i such that Xj ^ y and uj G Xi. Then, the desired result 
follows directly by equation ([2]), and by the fact that in this case the definition of d^{-) and 7(-, yields 

r(z,j) = o. □ 



C The proof of Lemma [3] 

Proof of Lemma [sj Let D be an optimal search tree for (T^^w). 

Let £ be the deepest node in the left path of D such that D — is the realization of tt^+i . . . vr^+m 
for some z = 0, . . . , n + m. In particular, we take z = n + m if ^ is the root of i.e., no upper part of D 
looks like a realization of some suffix of 11. 

By contradiction, assume that D is not a realization of 11, whence z > 0. We shall prove that by 
modifying in such a way that its top part becomes a realization of tt^ we obtain a new search tree 
with cost smaller than the cost of D. The desired result will follows by contradiction. We consider the 
following cases: 

Case 1. TTi = Xj, for some j = 1, 2, . . . , m. 

In this case, our assumption regarding £ implies that if a node jy G is associated with a leaf £^ in 
then £' either corresponds to an element u ^ U such that u Xj or £' G Xjf such that Xjf ^ Xj. Let 

Hi be such that Xj G We need to prove the following claim 

Claim 1. £e {qtj^Qrj}- 

Proof. We shall show it by contradiction. We split the proof into cases I and II. 

Case I. Suppose that the node qtj is the right child of . Let qjy (for some z/ G T) be the parent of Qrj . 
We have two cases according as is a right or a left child of Qj^. 

Subcase La is a right child of Qj^. Note that because of the search tree property jy must be an 
ancestor of /z^ in T^. We perform a left rotation on q^y. Let be the new tree obtained. We have that 
cost{D') < cost{D) —w{tj)^w{a)^ where a is the left subtree of Qjy. We observe if a node in a corresponds 
to a leaf f then f must be in T^\H^. 

Thus, the nodes of a can take care of: 

(a) leaves that are associated to some u ^ such that u ujs- The sum of the weights of these 
leaves is at most \T\ • w{ujs)/Q\T\^ < w{ujs)/2; 

(b) at most two leaves associated with u ^ U such that u = Ujs- The fact that every u ^ U appears 
in at most three sets of X together with the fact that 5^3 G H,^ explain that we have at most two leaves; 
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(c) leaves in Xjf such that Xjf Uj^. The sum of the weights of these leaves sum at most Wu^^- 

Thus, we can conclude that w{q) < 2.5w{uj3) + Wujs- Since w{tj) > 2.5w{uj3) + Wujs we conclude 
that cost{D') < cost{D)^ contradicting the optimality of D. 

Subcase Lb g^. is a left child of Qi^. This implies that v is not an ancestor of rj in T^. Let be a 
tree obtained as follows: we swap with q^y if u is not in Tj; otherwise, we swap qt^ with q^y. Let a be 
the right subtree of qj^. Again, we have cost{D') < cost(D) — w{tj) + w(a). 

If z/ ^ then the analysis is identical to the one employed in Subcase La because a can take care of 
the same leaves considered in that case. 

If is a leaf in H,^ then w{tj) > w{v) — w{a) because tj is the heaviest leaf among the leaves in 
that corresponds to a node in D^. Finally, if z/ is an internal node in \ {^a^} then z/ = Vji for some 
y < j and it follows from inequality ([s]) that w{tj) > w{t'-) + w{ujfs) + w{uj>2) + w{uj>i) — w{a). 

In either Subcase we obtain a tree of cost smaller than D violating the optimality of D. 

Case II. Alternatively, if qt^ is not the right child of qr- , then we can proceed as before. We consider the 
case where qt- is the right child of its parent and also the case where it is the left child. In the former 
case we apply a left rotation and in the latter a simple swap. Again we can prove that the resulting tree 
has cost smaller than a violation to the optimality of D. 

The proof of the claim is complete. □ 

Therefore, it must be ^ G {qtj^ Qirj}- We now split the analysis according to this two cases. 
Subcase 1.1. I — qj... Then, because of the assumption on D — and the search property, it follows that 
the right subtree of qr^ contains the nodes qtj^Qsj3^Qsj2^Qsji- Also, it is not hard to see that they must 
appear in this order, for otherwise, by reordering them we would decrease the average cost of D, since 
w{tj) > w{sjs) > w(sj2) > w(sji). Therefore the right subtree of i coincides with the right subtree of 
D/. 

Let us assume w.l.o.g that the level of g^^.^ is smaller than or equal to the level qa.^; in D, for k < k' . 
First, we argue that the left child of I must be g^ji- Assume that qa^^ is not the left child of I and let v 
be the parent of qa^^- We have two cases: 

A. qa-^ is a right child of v. 

We perform a left rotation on q^. Let D' be the new tree obtained. We have that cost{D') < 
cost(D) — w(qaji) + w{a) where a is the left subtree of u. Note that the search property assures that 
jy is an ancestor of /i^. Thus, the analysis of Subcase La in the above Claim 1, shows that the the 
sum of the weights of the leaves that a can takes care is upper bounded by 2.5w(ujs) + Wujs- Since 
^(Qaji) > 3'w(ujs) + Wujr^ we conclude that cost(D^) < cost(D). 

B. qaj-^ is a left child of u. In this case, we swap qaj-^ and u. Let be the new tree obtained. We 
have that cost{D') < cost{D) — w(qaj^) + w{a) where a is the right subtree of jy. Note that jy is not an 
ancestor of aji in T^. 

If z/ ^ the arguments employed in subcase LA shows that w{a) < 2.5w{ujs) + ^ujs- Since 
^{Qaji) > 3if;(?ij3) + Wujr^ we conclude that cost(D^) < cost(D). 

If z/ G Tj/, with Tjf G H,^ and / < j, it follows from inequality ([s]) that w{aji) > w{a). If z/ G Tj 
it follows from inequality Q that w{aji) > w{a). Finally, if z/ = aj^k with / < j we have that 
w{aji) > w{ajfk) — w{a) 

We can conclude that g'a^^is the left child of £. Since w{aji) = w{cij2) — "^{cijs) — "^{cijAli the same 
arguments show that the nodes following aji in the left path are qaj2 •> ^ctjs qaj^ • Let £^ be the left 
child of qaj4^. We have showed that in this subcase — D^f coincides with Dj^. 

Subcase 1.2. £ — qt^ . There is nothing to prove about the right subtree of £. On the other hand, in order to 
prove that the nodes following i in the left path of D are exactly g^^.^, qaj2^ Qajs and g^^.^, we can proceed 
as in Subcase 1.1. The only additional case to be taken care of, in the argument by contradiction used 
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there, is when the parent of Qa^^ is qrj. However, in this case we can employ the same argument we used 
for the analogous situation in Subcase 1.2. of the proof of Lemma [ij Let £^ be the left child of g^^.^. We 
have showed that in this subcase, — D^f coincides with . 

We can conclude that in both Subcase 1.1 and 1.2, the tree D — D^i is a realization of tt^, . . . , Hn+m 
against the assumption that I is the deepest node for which such a condition holds. 

Case 2. tt^ = Uj^ for some j = 1, 2, . . . , n. 

The proof is identical to that employed for Case 2 of Lemma [T] □ 

D The proof of Lemma [5] 

Let X and x* be the nodes queried at the root of D' and i?*, respectively. W. 1. o. g. we assume 
X 7^ X*, as otherwise the lemma trivially holds. We can also assume that x* is a node from T^, because 
the opposite case is analyzed analogously. 

Case 1: w{Tx) < w{T - T^). In other words, w{Tx) < w{T)/2. As any path from r(L)*) to a leaf in 
i?* contains r{D'^) and T — Tx does not contain x*. Lemma [4] states that the depth of any leaf in is 
at least by one smaller than it is in L)*. The lemma also implies that the depth of any leaf in Dq is not 
greater than it is in D*. So we have 

cost{D') = w{T) + cost{Dl) + cost{Dl) 

< w{T) + ^ w{v)d{T{D''), Q + ^(^) (^(K^*), Iv) - 1) 

= w{T) + cost{D'') - w{T - Tx) < cost{D'') + w{T)/2 . 

Case 2: w{Tx) > w{T-Tx). Let xi, . . . , Xji be the nodes successively queried when the path ri^D"^) ^ 
r{D') is traversed in In particular, xi = x* and x^ = x. Let < n be such that xi is a node from 
- {x} for i = 1, . . . , A: and XkJ^i ^T^- {x}. 

In this extended abstract we assume that w{Tx—Tx-) > for z = 1, . . . , A:. The case of w{Tx—Tx-) = 
can only occur when there is tie regarding the choice of node x in step (1) of the algorithm, and then the 
above scenario can be avoided by employing a suitable tie breaking rule. In the full paper we will show 
by a more intricate case analysis that the approximation factor holds regardless of the tie breaking rule. 

For i = 1, . . . , we know that w{Tx-) < w(T — Tx-), because otherwise, using the assumption that 
w{Tx - Tx,) > 0, we would have w{Tx,) - w{T - Tx,) = w{Tx,) - w{Tx - Tx,) - w{T - Tx) = w(Tx) - 
w{T — Tx) — 2w{Tx — Tx-) < w{Tx) —w{T — Tx), and so Xi would have been chosen instead of x in step (1) 
of the algorithm. 

From this fact, it follows that w{Tx-) < w{T — Tx) for i — 1,...,A:. This is because otherwise 
w{Tx) - w{T - Tx) = w{Tx,) + w{Tx - Tx,) - w{T - Tx) > w{T - Tx) + w{Tx - Tx,) - w{Tx,) = 
w{T — Tx-) — w{Tx-) > 0, so Xi would have been chosen instead of x in step (1). 

Let T' := Uz=i Tx, and let T'' := Tx - T' . Note that T' is a forest in general and T' U T" = T,. We 
are going to reason about the search tree depths of the nodes in T — T^, T', and T" separately. 

queries all nodes from T', and Lemma |4] states that the depth of those nodes is not greater in 
than it is in 

The nodes from T" are as well all queried in D^. For these nodes we know that in L)* the node Xk^ri 
is queried before them. As x/e+i is not queried by D^, the depth of each node from T" in is by at 
least by one smaller than it is in I?*. 

Finally, the leaves in i?* corresponding to the nodes from T — Tx are descendants of the nodes in i?* 
querying xi, . . . ,Xjt. These k nodes are not contained in so the depth of each leaf in D\ is at least 
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by k smaller than it is in Combining the findings, we obtain 

<^(T)+5^^(i;)d(r(I?*),/,)+ ^ ^(^;)(d(r(i?*), /,)-!)+ ^ w(v){d(r{D''),k) - k) 

= w{T) + cost{D'') - w{T") - kw{T - T,) . 

Asr = T- ((T - T,) U r^), we have w{r) = it;(r) - w{T - T,) - w{r'), so 

co5t(i:>0 < cost{D'') + it;(rO - (fc - l)it;(T - T,) . 

We have argued above that < it;(T - T,) for z = 1, . . . , A:. Therefore, it;(rO = yo{[jl^-^T:^^) < 

Y!l=MT,^) <kw{T -T,), and 

cost{D') < cost{D'') + kw{T - T^) - {k - l)w{T - T,) = cost{D'') + w{T - T,) < cost{D'') + w{T)/2 . 

□ 

E An FPTAS for Searching in Bounded-Degree Trees 
E.l Algorithm for V^{F, P) 

In this section we complete the correctness proof of the proposed algorithm for solving V^{F, P). It has 



already been argued in Section |3.2| that the algorithm always returns a feasible solution. In addition, 
in Case 1 of the algorithm, the returned solution is also optimal. Here we prove the optimality for the 
second case: 

Case 2: F is a tree Ty. Let be an optimal solution for {Ty, P). Consider the internal node of 
assigned to v; since i?* is compatible with P and since this node belongs to the left path of i?*, it 
corresponds to a node pi of P. Thus, we denote this internal node of i?* assigned to v hj p[. Let be the 
leaf of i?* assigned to v and notice that lies in the left path of the right subtree of p[. We construct 
from by essentially applying the inverse of Step [2] of the algorithm: remove from L)* the right subtree 
of p[; this removed subtree becomes the subtree of p[; assign p[ as blocked and remove z^ (One can use 
Figures [7[d and[7|c to better visualize this construction.) 

The tree is actually an EST for the forest {T^^^^), . . . ,Tc^^^^(^)} and has height at most B. Now 
construct P' by taking the left path of D'^ setting all the non-blocked nodes as unassigned and also 
setting every node after p[ as unassigned. Clearly is compatible with P^ and thus feasible for 

Notice, however, that P^ starts with the prefix of P until pi (in terms of its assignment), then it has 
a blocked node corresponding to pi and then some unassigned nodes. Let t be the number of nodes in 
P^ Since the last node of P^ comes from the parent of z^ in and i?* has height at most B, we have 
that t < B. Thus, the path P^ coincides with the path P^'^ constructed by the algorithm when t = t. 

It is easy to see that the tree L)^'^ as defined in the algorithm, has cost 

opr(p^({r,^(,), . . . , r,^(^^(,)}, p^'*) + t-wiv) = opt{v''{{t,^(,), T,^,^,i,)}, P') + t-wiv), 

which is at most cost{D') + iw{v) due to the feasibility of D' . Finally, notice that this last quantity is 
actually the cost of i?*, so cost{D^'^) < cost{D'^). Since the procedure returns a solution which is at least 
as good as D^'^, its optimality follows. 
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E.2 Proof of Lemma [7] 

By means of contradiction suppose i;* G i?* with (i(r(i?*), i;*) > c but w{D^^) > a • w{D''). Let T be the 
subtree of T associated with i;* and let x be the root of T. 

Let y be a node in Tx to be specified later. Let — T — Ty and — rc-(y), for z = 1, . . . , S{y). 
Moreover, let be the search tree for obtained from £)* via Lemma jij We shall construct a new 
search tree D' for T as follows: the root of D' is assigned to the left tree of r{D') is the search 
tree D^] in the right tree of r{D') we build a left path containing nodes corresponding to queries for 
C2(^), . . . , c^(y^{y) and we make becomes the right subtree of node querying Ci{y). 

It is easy to see that the cost of D' is at most cost{D'^) + (A(r) + 1) • w{T). We claim that, for 

a suitable choice of y, D' improves over i?*. For this, let S be the set of nodes of Tx which are queried 
in the path r{D'') ^ i;*. We distinguish the following cases. 

Case 1: \S\>'^. Set as a node in Tx such that \Ty n > ^ and \T^^{^y) H S^l < ^ for every child 
Ci{y) of y and construct as described previously. To find such a node y^ traverse Tx starting at its root 
and proceeding as follows: if u is the current node then move to the child v oi u with largest \Ty H S\] the 
traversal ends when \Tu(^ S\ < The parent of the node where the traversal ends is the desired y. 

To bound the cost of we first consider the cost of a particular tree D^. From its construction we 
have that d{r{D^)^ lu) < d{r{D'^)Ju) for any node u G T\ Moreover, for any node u e T^ HT the path 
r{D'^) ^ lu contains i;* and therefore it contains 15 \ queries to nodes in Tx\T\ Since these nodes 
were removed in the construction of D^, we have that for every u G T'^ DT 

d{r{D'), Q < d{r{D*), Q - |<S \ r | < d{r{D*), - ^ , 
where the last inequality follows from the definition oi y. It follows that 

cost{D') < J2 diriD"^). In) ' w{u) - \S\-^ir^T) ^ 

Combining this bound with our upper bound on the cost of D' we get that 

cost{D') < cost{D'') - d{r{D''), ly) • w{y) - ^^^"^^^"^^ + (A(T) + 1) • w{T) . 

We claim that actually cost{D') < cost{D'') - _|_ (A(T) + 1) 'w{T). To see this, first suppose y G f ; 

then d{r{D''), ly) • w(y) > \S\ • w{y) and the claim holds. In the other case where y ^ T, the claim follows 
from the fact w{T — y) w{T). 

By making use of this claim, the hypothesis on l^l and the facts that w{T) = w{Dl^) > a • w{D'') 
and c - a > 3(A(T) + 1), we conclude that improves over i?*, which is a contradiction. 

Case 2: l^l < y. We set y = x and construct D' as described at the beginning of the proof. 

Again, we are trying to reach the contradiction cost{D') < cost{D''). Recall that cost{D') < 
cost{D') + (A(T) + 1) • w{T), so we bound the cost of the trees D''s. 

By construction we have that cost{D^) < J^ueT^ d{r{D''),lu)w{u). Now consider some tree for 
z 7^ 0. From its construction we have that d{r{D^)Ju) ^ d{r{D'^)Ju) for any node u G T\ Moreover, 
for any node u ^ T'^ H T the path r{D'') ^ lu contains i;* and therefore it contains at least c — \S\ 
queries to nodes in T — = T^. Then Lemma ^ guarantees that for every u G H T we have 
d{r{D'),lu)<d{r{D*),lu)-{c-\S\). 
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Weighting these bounds over aU nodes in T we have: 



5{y) 5(y) s{y) 

Y^cosm) < ^^d(r(Z?*),/„M«)-E E ic-\S\)-wiu) 

= cost{D'') - d{r{D''),Qw{x) -{c-\S\)- {w{f) - w{x)) 
< cost{D'')-{c-\S\)-w{f), 

where the last inequahty is vahd because Ix is a descendant of i;* in I?* so that d{r{D''), Ix) ^ c. Thus, by 
combining the upper bound on cost(D^) with the previous equation in the display we get that cost{D^) < 
cost{D'') - (c - l^l) • w{f) + (A(T) + 1) • w{T). By making use of the hypothesis l^l < ^ and the facts 
that w{T) = w{Dl^) > a • w{D'^) = a • w{T) and c - a > 3(A(T) + 1), we conclude that D' improves over 
I?*, which gives the desired contradiction. 

E.3 Proof of Theorem [3] 

The following lemma shows that that the bound on the height of the shortest optimal tree holds even 
when the weight function is not strictly positive. 

Lemma 8. There is an optimal search tree for (T, w) of height at most 0{A{T) • (log w{T) + logn)). 

Proof Consider an optimal search tree i?* for (T^w). Notice that for any v G i?*, i?** is an optimal 
search tree for the subtree of T associated with v. So we can employ the Lemma [7| repeatedly and get 
that for every node v of L>* at a level / = 0(A(T) • logw{T)), w{Dl) = 0. 

Now let L be all the nodes of i?* at level /. For each v ^ L let be the shortest search tree for the 
subtree of T associated with node v. It was proved in [5j that the height of can be upper bounded by 
(A(T) + 1) • logn. Then we can construct the search tree for T as follows: start with and for each 
V ^ L replace by D^. Clearly has height at most 0(A(T) • {logw{T) + logn)). Moreover, since 
w{D^) = w{Dl) = for all G L, it follows that D' has the same cost as i?* and hence is optimal. □ 

Theorem 3. Consider an instance (T^w) to our search problem where A{T) = 0(1). Then there is an 
algorithm for computing an optimal search tree for {T,w) that runs in poly{n • w{T)) time. In addition, 
there is an algorithm for computing an (1 + e)- approximate search tree for (T, w) that runs in poly(n/e) 
time. 

Proof. The existence of an exact pseudo-polynomial algorithm which runs in poly{n • w{T)) time follows 



from the discussion presented in Section 3.2 (see Prom the DP algorithm to an FPTAS.). Thus, we 
only prove the second claim of the theorem, namely, that our search problem admits an FPTAS. 
We claim that the following procedure gives the desired FPTAS: 

L Let W be the weight of the heaviest node of T, namely W = mSiXueT{uj{u)}. Define K = and 
the weight function such that w\u) — \w{u)/K~\ for every node u ^T. 

2. Find an optimal search tree D for (T, w^) using the pseudo-polynomial algorithm and return D. 

First we analyze the running time this procedure. Clearly Step [l] takes at most 0{n) time. In 
order to analyze Step [2j let = uidiXueTi'w^u)} and notice that = [VF/A^] < (n^)/e + 1. Thus, 
w'{T) < nW' < {n^)/e + n. Then the pseudo-polynomial algorithm employed in Step [2] runs in poly{n • 
w'{T)) — poly{n/e). The running time of the whole procedure is then poly(n/e), as desired. 

Now we argue that the solution D returned by the procedure is (1 + e)-approximate for the instance 
(T, w). Let us make the weights explicit in the cost function, e.g. we denote by cost{D, w) and cost{D, w') 
the cost of D with respect to the weights w and w' . Thus we want to prove that cost(D^w) < (1 + 
e)cost{D'' ^w)^ where L)* is an optimal search tree for (T^w). 
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Clearly for each node u e T we have K • w'{u) < w{u) + K and hence 

K . cost{D\ w') < cost{D\w) + d{r{D''), Q • K < cost{D\w) + • K = cost{D\ w) + e'W, 

ueT 

where the last inequality follows from the fact that the distances are trivially upper bounded by n. 
Excluding the trivial case where T is empty, notice that every path in D* from r{D'^) to a leaf has length 
at least one. Thus, cost{D'^^w) can be lower bounded by and the previous displayed inequality gives 
K • cost{D* ^w') < (1 + e)cost{D'' ^w). But since w{u) < K • w'{u) for all we have that 

cost{D,w) < K • cost{D,w') < K • cost{D'',w') < (1 + e)cost{D'' ,w) , 

where the second inequality follows from the optimality of D. Therefore, D is a (1 + 6)-approximate 
search tree for the instance (T, w)^ which concludes the proof of the theorem. □ 

F Polynomiality of the tree search problem for instances of diameter 
at most 3 

First consider an instance {T^w) of our search problem where T has diameter two, i.e., it is a star. Let 
us root the star in its center. Employing a simple exchange argument it is easy to show that the children 
of r{T) must be queried according to their weights, in decreasing order. Thus, an optimal search tree for 
(T^w) can be built based on any sorting algorithm in O(nlogn). 

Now assume T has diameter 3. Notice that the only possible structure for T is the following: there 
are two nodes r and joined by an edge and all other nodes are either adjacent to r or to r'. In order to 
define the questions, let us take r as the root. Let I {I') be the heaviest leaf among the children of r (r'). 
It should not be difficult to see that the root of any optimal search tree must query one of the nodes in 
the set {r^, Z, Z'}. This can be proved using a simple exchange argument. If r{D) is assigned to then its 
right subtree is an optimal search tree for T^/ and its left subtree is an optimal search tree for T — T^/ . 
If r{D) is assigned to I then its right subtree is a leaf assigned to I and its right subtree is an optimal 
search tree for T — /. Analogously, when r{D) is assigned to V its right subtree is an optimal search tree 
for T — I' . Finally, notice that in the first case, both T^/ and T — Tj./ have diameter at most 2. 

Consider the recursion tree of the above procedure; notice that every subproblem (T^, w) has a specific 
structure: is the subtree of T induced by nodes r, r', the ith heaviest leaf-children of r and the jth 
heaviest children of (for some z, j). Employing a Dynamic Programming strategy together with an 
O(nlogn) preprocessing for the two stars centered at r and it is not difficult to see that each of these 
O(n^) problems can be solved in 0(1) time. This gives an 0{ii?) algorithm for finding an optimal search 
tree for (T^w). 
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Figures 




Figure 1: (left) The input tree T; (right) a search tree D for T 




Figure 2: The tree obtained from instance I = ({a, 6, c, e, /}, {Xi, X2, X3, X4}) of 3-bounded X3C. 




Figure 3: The two possible configurations we use for the part of the search tree that concerns the subtree 
Ti and the leaf and a sequential search tree for T^. 
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Figure 4: Realization (left), and the (optimal) Realization w.r.t. the exact cover {Xi,X4} (right) — in 
bold are the questions involved in the configuration changes. Only the leaves associated to nodes of T 
with non-zero weights are shown here. 
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a b c 



Figure 5: The tree obtained from the instance I = ({a, 6, c, d, e, (i, /}, {Xi, X2, X3, X4}) of 3-bounded 
X3C. 



(a) (b) (c) (d) 




Figure 6: (a) PLP P with partition U = {U^ ,U''} indicated. The blank nodes are unassigned and the 
black ones are blocked, (b) PLP's Pf and P^. (c) The optimal EST's and and (d) the resulting 
EST constructed by taking the 'union' of and D"" . 
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